QBiC

Project Members:

Prof. Dr. Verena Keitel

verena.keitel@med.uni-duesseldorf.de

Universitätsklinikum Düsseldorf


Dr. Maria Reich

maria.reich@uni-duesseldorf.de

Universitätsklinikum Düsseldorf


Prof. Dr. Mathias Heikenwälder

m.heikenwaelder@dkfz-heidelberg.de

DKFZ Heidelberg




QBiC contacts:

Dr. Gisela Gabernet

Bioinformatics project manager

gisela.gabernet@qbic.uni-tuebingen.de


1 Introduction

2 Loading the dataset

Loading all the individual samples.

Merging all samples in the dataset. The table represents the number of cells that are available for each sample.

Condition Cell_number
SI-GA-C1 3373
SI-GA-C2 3562
SI-GA-C3 8168
SI-GA-C4 7833
SI-GA-C5 5290
SI-GA-C6 7010
SI-GA-C7 5179
SI-GA-C8 6188

Adding sample conditions. The table represents the number of cells that are available for each condition.

Condition Cell_number
MDR2_KO 27368
WT 19235

3 Standard pre-processing workflow

Important QC params for eliminating bad quality cells (could be droplets without cells) are:

  • Number of unique genes detected in each cell
  • Total number of molecules detected within a cell

Calculating the percentage of genes mapping to mitochondrial genome for QC:

  • Low-quality / dying cells often exhibit extensive mitochondrial contamination
  • We calculate mitochondrial QC metrics with the PercentageFeatureSet function, which calculates the percentage of counts originating from a set of features
  • We use the set of all genes starting with mt- as a set of mitochondrial genes

Visualization of the QC metrics:

  • recommended to filter out the cells that have unique feature counts over 2500 or less than 200
  • recommended to filter out cells that have >10% mitochondrial counts

3.1 Filtering out low quality cells

Low quality cells need to be filtered out:

  • Filtering out the cells that have unique feature counts over 4000 or less than 200
  • Filtering out cells that have >10% mitochondrial counts

After filtering low quality cells, the cell numbers per sample are the following:

Condition Cell_number
SI-GA-C1 2668
SI-GA-C2 2317
SI-GA-C3 7327
SI-GA-C4 6248
SI-GA-C5 4905
SI-GA-C6 6591
SI-GA-C7 4523
SI-GA-C8 5754

3.2 Data normalization

By default, we employ a global-scaling normalization method "LogNormalize" that normalizes the feature expression measurements for each cell by the total expression, multiplies this by a scale factor (10,000 by default), and log-transforms the result.

3.3 Detection of highly variable features

Detection of highly variable features. The displayed gene names represent the top 10 most variable genes.

## When using repel, set xnudge and ynudge to 0 for optimal results

3.4 Linear dimensionality reduction

Applies PCA to the highly variable features, after a standard scaling of the features to a mean of 0 and SD of 1.

## Centering and scaling data matrix
## PC_ 1 
## Positive:  Aldob, Gsta3, Stard10, Rida, Chchd10, Ttc36, Gstm1, Fbp1, Gnmt, Hpd 
##     Bhmt, Ttr, Ass1, Serpina1c, Car3, Ddt, Cdo1, Phyh, Serpina1b, Urah 
##     Uox, Prdx6, Apoc3, Fabp1, Rbp4, Akr1c6, Serpina1a, Scp2, Apoa2, Rgn 
## Negative:  Tmsb4x, Psap, Ctsb, Ctsd, Lgmn, Maf, Mrc1, Gatm, Srgn, Fcer1g 
##     Hexa, Zeb2, Dab2, Cst3, Ifi203, Ucp2, Man2b1, Grn, Cdh5, Klf2 
##     Cotl1, Npc2, Mndal, Cyba, Sdc3, Cd38, Ptgs1, Gngt2, Tgfb1, Eng 
## PC_ 2 
## Positive:  Csf1r, C1qc, Ctss, C1qa, C1qb, Ctsc, Wfdc17, Tyrobp, Lyz2, Mpeg1 
##     Clec4f, Cd5l, Aif1, Cd68, Vsig4, Laptm5, Cd74, Fcgr3, Cybb, Cfp 
##     Acp5, Spi1, Fyb, Ccl24, Folr2, Ccl6, Cd300a, Creg1, Ifi30, Adgre1 
## Negative:  Ptprb, Ehd3, Aqp1, Kdr, Clec4g, Dnase1l3, Gpihbp1, Eng, Ramp2, Fam167b 
##     Gpr182, Adgrl4, Adgrf5, Egfl7, F8, Fabp4, Plpp1, Fcgr2b, Bmp2, Plxnc1 
##     Lifr, Oit3, Plpp3, Nrp1, Clec14a, Flt1, Gng11, Cd300lg, Tspan7, Cd55 
## PC_ 3 
## Positive:  Fabp1, Apoa2, Car3, Gnmt, Hpd, Ass1, Bhmt, Apoc3, Rgn, Uox 
##     Otc, Arg1, Adh1, Tdo2, Akr1c6, Wfdc21, Hpgd, Thrsp, Cth, Gpx1 
##     Serpina1c, Apoc4, Haao, Ttc36, Mat1a, Sord, Acaa1b, Serpina3k, Mup3, Cyb5a 
## Negative:  Epcam, Spp1, Clu, Tm4sf4, Cp, Sox4, Tmem45a, Cldn3, Bicc1, Cldn7 
##     S100a11, Atp1b1, Alcam, Mfge8, Dsg2, Tmsb10, Sox9, Cd24a, Ambp, Sorbs2 
##     Gas6, Ccn1, Wfdc2, Jun, Dcdc2a, Krt19, Cxadr, Krt18, Krt8, Sdc4 
## PC_ 4 
## Positive:  Spp1, Epcam, Tm4sf4, Tm4sf1, Dnase1l3, Clec4g, Kdr, Dpp4, Gpihbp1, Gpr182 
##     Egfl7, Ptprb, Cldn3, Fam167b, Mrc1, Fabp5, Aqp1, Fcgr2b, Ccnd1, Stab2 
##     Cldn7, Adgrl4, Oit3, Clu, Ambp, Cemip2, Adgrf5, F8, Cd24a, Cd300lg 
## Negative:  Col3a1, Lum, Cygb, Col1a2, Col6a1, Col1a1, Smoc2, Sparcl1, Dcn, Dpt 
##     Islr, Hand2os1, Col6a3, Gdf10, Olfml3, Mmp2, Col6a2, Col14a1, Fam180a, Tcf21 
##     Fbln5, Cpxm1, Gsn, Cxcl12, Adamtsl2, Mfap4, Angptl2, Pdgfrb, Adamts2, Loxl1 
## PC_ 5 
## Positive:  Lsp1, Cd52, Ptprcap, Cd79a, Coro1a, Cd37, Napsa, Iglc3, Cd79b, Ighm 
##     Ms4a1, Gimap3, Iglc2, Cd2, Mzb1, Ebf1, Sept1, Cytip, H2-DMb2, Cd69 
##     Fcmr, Siglecg, Igkc, Gimap7, Itgb7, Dusp2, Rnase6, St8sia4, Ms4a4b, Fcrla 
## Negative:  Apoe, Slc40a1, Cd5l, Fcna, Clec4f, Vsig4, Folr2, Ccl24, Timd4, C6 
##     Hpgd, Ccr3, Cfp, Sdc3, Il18bp, Hmox1, Ctsb, Wfdc17, Chp2, Lgmn 
##     Csf1r, Creg1, Cd163, Grn, C1qc, Vcam1, Lilra5, Axl, Siglec1, Abcc3

## Saving 7 x 5 in image
## Saving 7 x 5 in image

3.5 Determine dimensionality of data set

Determining number of PCAs to consider for clustering from Elbow plot. It is recommended to go rather on the higher end of PCAs.

## Saving 7 x 5 in image
## Saving 7 x 5 in image
  • Number of chosen PCs: 15.

4 Cell clustering

As in PhenoGraph, we first construct a KNN graph based on the euclidean distance in PCA space, and refine the edge weights between any two cells based on the shared overlap in their local neighborhoods (Jaccard similarity).

This step is performed using the FindNeighbors function, and takes as input the previously defined dimensionality of the dataset.

To cluster the cells, we next apply modularity optimization techniques such as the Louvain algorithm (default) or SLM, to iteratively group cells together, with the goal of optimizing the standard modularity function.

## Computing nearest neighbor graph
## Computing SNN
  • Number of chosen PCs: 15.
  • Chosen resolution: 0.1.

4.1 UMAP: Non-linear dimensionality reduction

Non-linear dimensionality reduction method UMAP was applied to visualize the cell clusters.

## 09:04:27 UMAP embedding parameters a = 0.9922 b = 1.112
## 09:04:27 Read 40333 rows and found 15 numeric columns
## 09:04:27 Using Annoy for neighbor search, n_neighbors = 30
## 09:04:27 Building Annoy index with metric = cosine, n_trees = 50
## 0%   10   20   30   40   50   60   70   80   90   100%
## [----|----|----|----|----|----|----|----|----|----|
## **************************************************|
## 09:04:31 Writing NN index file to temp file /var/folders/2z/rqg0lktx6nq0_wqgs1yrx0lw0000gn/T//RtmpkGPrC5/fileec9a427c8656
## 09:04:32 Searching Annoy index using 1 thread, search_k = 3000
## 09:04:45 Annoy recall = 100%
## 09:04:45 Commencing smooth kNN distance calibration using 1 thread
## 09:04:47 Initializing from normalized Laplacian + noise
## 09:04:52 Commencing optimization for 200 epochs, with 1684704 positive edges
## 09:05:14 Optimization finished

## Saving 7 x 5 in image
## Saving 7 x 5 in image

4.2 Find differentially expressed features

Finding markers (genes) that define clusters via differential gene expression expression. A table containing the top markers found for each of the clusters can be found under results/markers/.

## Calculating cluster 0
## Calculating cluster 1
## Calculating cluster 2
## Calculating cluster 3
## Calculating cluster 4
## Calculating cluster 5
## Calculating cluster 6
## Calculating cluster 7
## Calculating cluster 8

Expression heatmap for the top 20 markers for each cluster. All plots for the markers can be found under results/markers.

4.3 Plotting provided markers

All marker plots in this section can be found under results/markers.

4.3.1 Markers for the cell types

Choose the cell type:

Cholangiocytes

Macrophage

Kupffer cells

Hepatic Stellate

Provided Hepatic stellate cells markers: Des, Reln. Other markers in this cluster: Dcn, Gsn, Cxcl12, Lum

Endothelial cells

Fibroblasts

Hepatocytes

## Warning in FetchData(object = object, vars = c(dims, "ident", features), : The
## following requested variables were not found: Serpina 1b

Liver projenitor cells

Plotting provided liver progenitor cells (LPC) markers.

B-lymphocytes

T-lymphocytes

5 Cluster cell type assignment

UMAP plot with cell type assignments according to gene expression profiles of the clusters. All UMAP plots are found under results/umap.

## Saving 7 x 5 in image
## Saving 7 x 5 in image

Number of cells identified for each of the cell types:

Condition Cell_number
Cholangiocytes 12965
Endothelial cells 10879
Hepatocytes 10751
Macrophage / Kupffer 3931
HSC & PF 832
B-lymphocytes 521
T-lymphocytes 454

Heatmap with the cell types names:

Different representations of the UMAP plot. Choose the representation:

Samples

Grouping

Grouping split

6 Differential gene expression analysis

The differential gene expression analysis was performed for each of the cell types, by comparing the MDR2 KO to the WT mice population. Here the results tables are displayed for each cell population.

All tables for the Differential Gene Expression analysis can be found under results/DE_genes.

Cholangiocytes

p_val avg_logFC pct.1 pct.2 p_val_adj gene log_pval_adj
Trf 0 2.1110825 0.782 0.352 0 Trf Inf
Gc 0 1.5645869 0.659 0.199 0 Gc Inf
Pyy 0 -0.3077262 0.009 0.238 0 Pyy Inf
Tbxas1 0 -0.3112804 0.009 0.209 0 Tbxas1 Inf
mt-Co1 0 -0.4247504 0.998 0.996 0 mt-Co1 Inf
Fads1 0 -0.4436855 0.116 0.515 0 Fads1 Inf
Angptl1 0 -0.4575614 0.020 0.318 0 Angptl1 Inf
Ldlr 0 -0.5305169 0.075 0.485 0 Ldlr Inf
Mal 0 -0.5539402 0.175 0.670 0 Mal Inf
Ppy 0 -0.5594680 0.033 0.499 0 Ppy Inf
Sqle 0 -0.6364168 0.189 0.665 0 Sqle Inf
Hmgcr 0 -0.6596734 0.157 0.645 0 Hmgcr Inf
Cyp51 0 -0.6597784 0.158 0.626 0 Cyp51 Inf
Ces1c 0 -0.6866873 0.201 0.701 0 Ces1c Inf
Cldn10 0 -0.7309091 0.026 0.457 0 Cldn10 Inf
Fdft1 0 -0.7445015 0.214 0.756 0 Fdft1 Inf
Insig1 0 -0.8270130 0.052 0.591 0 Insig1 Inf
Txnip 0 -0.8511348 0.668 0.973 0 Txnip Inf
Fdps 0 -1.0167868 0.332 0.855 0 Fdps Inf
Ces1b 0 -1.0414215 0.299 0.830 0 Ces1b Inf
## When using repel, set xnudge and ynudge to 0 for optimal results

HSC & PF

p_val avg_logFC pct.1 pct.2 p_val_adj gene log_pval_adj
Cygb 0 0.8859691 0.924 0.670 0 Cygb 31.076482
Lum 0 0.6873188 0.918 0.697 0 Lum 17.039316
Gdf10 0 0.7975404 0.791 0.438 0 Gdf10 15.956817
Igfbp7 0 0.4474661 0.991 0.962 0 Igfbp7 15.273787
Tmem176a 0 0.5278784 0.957 0.838 0 Tmem176a 14.767810
Lpl 0 0.7019331 0.855 0.589 0 Lpl 14.528659
Igkc 0 -0.4590238 0.008 0.151 0 Igkc 13.358554
Serpina1c 0 -1.3229383 0.105 0.346 0 Serpina1c 11.620114
Itga8 0 0.5691132 0.743 0.411 0 Itga8 11.485966
Cpxm1 0 0.6316605 0.688 0.351 0 Cpxm1 11.252188
Smoc2 0 0.5514703 0.816 0.432 0 Smoc2 11.198486
Fabp1 0 -1.2516721 0.221 0.519 0 Fabp1 10.878564
Olfml3 0 0.5672079 0.805 0.524 0 Olfml3 10.712538
Ccdc80 0 0.5292820 0.861 0.638 0 Ccdc80 10.604431
C4b 0 0.7715481 0.643 0.314 0 C4b 10.221376
Plac8 0 0.4868106 0.700 0.378 0 Plac8 10.020312
Col3a1 0 0.4681979 0.947 0.681 0 Col3a1 9.759870
Nfib 0 0.4611013 0.915 0.681 0 Nfib 9.503193
Tmem176b 0 0.3896730 0.968 0.886 0 Tmem176b 9.283239
Eva1b 0 0.5255798 0.844 0.632 0 Eva1b 8.993763
## When using repel, set xnudge and ynudge to 0 for optimal results

Macrophage / Kupffer

p_val avg_logFC pct.1 pct.2 p_val_adj gene log_pval_adj
Cd163 0 -0.9332278 0.218 0.617 0 Cd163 137.26699
Cd63 0 0.9976663 0.675 0.333 0 Cd63 110.36191
Gm42418 0 -0.7163580 0.997 0.999 0 Gm42418 107.73632
Trem2 0 1.0060072 0.577 0.293 0 Trem2 86.61085
Cd209g 0 -0.5506183 0.107 0.375 0 Cd209g 77.26026
Cd209f 0 -0.5365644 0.146 0.440 0 Cd209f 74.41488
Cd52 0 0.6921410 0.580 0.301 0 Cd52 65.25843
Ccl6 0 -0.5979827 0.700 0.873 0 Ccl6 62.43041
Hpgd 0 -0.4425947 0.778 0.891 0 Hpgd 60.24869
C6 0 -0.5189466 0.454 0.706 0 C6 60.03574
Adgre4 0 -0.3388996 0.204 0.494 0 Adgre4 59.65777
B2m 0 -0.2740110 0.976 0.978 0 B2m 58.38489
Slc40a1 0 -0.5002085 0.784 0.862 0 Slc40a1 57.28906
Fcna 0 -0.5195819 0.650 0.825 0 Fcna 56.45639
H2-D1 0 -0.3024836 0.981 0.985 0 H2-D1 53.58588
Mmp12 0 1.3301896 0.287 0.062 0 Mmp12 51.55556
Klra2 0 -0.3427506 0.200 0.465 0 Klra2 51.50673
C4b 0 -0.3489368 0.065 0.246 0 C4b 51.44484
Cd9 0 0.5143578 0.373 0.121 0 Cd9 50.90544
Cxcl13 0 -0.5258395 0.021 0.151 0 Cxcl13 49.89640
## When using repel, set xnudge and ynudge to 0 for optimal results

Endothelial

p_val avg_logFC pct.1 pct.2 p_val_adj gene log_pval_adj
Gm42418 0 -0.5213225 0.996 0.999 0 Gm42418 300.0762
Fabp1 0 -0.9224697 0.278 0.578 0 Fabp1 256.3818
Serpina1b 0 -1.2592673 0.101 0.376 0 Serpina1b 248.2143
Alb 0 -1.0139004 0.170 0.453 0 Alb 241.9936
Serpina1c 0 -1.2179086 0.097 0.362 0 Serpina1c 233.8848
Sparc 0 0.3566259 0.971 0.912 0 Sparc 209.9261
Rbm3 0 0.5212910 0.499 0.228 0 Rbm3 199.2653
Apoa2 0 -0.8519819 0.201 0.455 0 Apoa2 187.3788
Tmem176a 0 0.6029185 0.419 0.172 0 Tmem176a 184.0255
Tmem176b 0 0.5771054 0.453 0.205 0 Tmem176b 175.6252
Selenop 0 -0.2909057 0.995 0.998 0 Selenop 167.3594
Lgals1 0 0.5146200 0.746 0.524 0 Lgals1 162.5212
Actg1 0 0.4223766 0.878 0.731 0 Actg1 151.0100
Cd36 0 0.5114801 0.638 0.414 0 Cd36 138.1584
Serpina1a 0 -0.8958642 0.063 0.229 0 Serpina1a 128.0064
Ass1 0 -0.9629828 0.122 0.309 0 Ass1 126.1331
Cd9 0 0.5629497 0.312 0.124 0 Cd9 123.9131
Gnmt 0 -0.9427198 0.084 0.253 0 Gnmt 116.9973
Apoc3 0 -0.7844741 0.095 0.261 0 Apoc3 110.6210
Gsta3 0 -0.7625480 0.103 0.267 0 Gsta3 104.5355
## When using repel, set xnudge and ynudge to 0 for optimal results

Hepatocytes

p_val avg_logFC pct.1 pct.2 p_val_adj gene log_pval_adj
Mup20 0 3.1331920 0.710 0.093 0 Mup20 Inf
Cib3 0 1.8517138 0.497 0.003 0 Cib3 Inf
Gstp1 0 1.7505891 0.849 0.445 0 Gstp1 Inf
Selenbp2 0 1.5684214 0.854 0.599 0 Selenbp2 Inf
Gm50136 0 1.1961122 0.434 0.042 0 Gm50136 Inf
Aox3 0 1.0877273 0.532 0.188 0 Aox3 Inf
Nudt7 0 1.0662149 0.862 0.683 0 Nudt7 Inf
Mup21 0 0.9360100 0.306 0.045 0 Mup21 Inf
Car3 0 0.5058082 0.997 0.997 0 Car3 Inf
Hpd 0 -0.4276048 0.970 0.989 0 Hpd Inf
Serpina1d 0 -0.6515106 0.628 0.860 0 Serpina1d Inf
Asl 0 -0.7051982 0.739 0.888 0 Asl Inf
Mat1a 0 -0.7179143 0.640 0.842 0 Mat1a Inf
Gm42418 0 -0.7330641 0.951 0.993 0 Gm42418 Inf
Thrsp 0 -0.8038894 0.663 0.877 0 Thrsp Inf
Serpina1c 0 -0.9547883 0.935 0.994 0 Serpina1c Inf
Gstt3 0 -0.9806314 0.112 0.497 0 Gstt3 Inf
Serpina1a 0 -1.0050085 0.858 0.983 0 Serpina1a Inf
Serpina1b 0 -1.1255318 0.918 0.997 0 Serpina1b Inf
Hamp2 0 -1.1664698 0.089 0.473 0 Hamp2 Inf
## When using repel, set xnudge and ynudge to 0 for optimal results

Saving R Seurat object:

7 Methods

For the single-cell data analysis the R package Seurat v3.2.2 was employed. Graphs were produced in RStudio with R version 4.0.3 (2020-10-10) mainly using the R package ggplot2 v3.3.2. Final reports were produced using the R package rmarkdown v2.6, with knitr v1.30.